home *** CD-ROM | disk | FTP | other *** search
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- NAME
- flex - fast lexical analyzer generator
-
- SYNOPSIS
- flex [ -dfirstvFILT -c[efmF] -Sskeleton_file ] [ filename ]
-
- DESCRIPTION
- flex is a rewrite of lex intended to right some of that
- tool's deficiencies: in particular, flex generates lexical
- analyzers much faster, and the analyzers use smaller tables
- and run faster.
-
- OPTIONS
- In addition to lex's -t flag, flex has the following
- options:
-
- -d makes the generated scanner run in debug mode. When-
- ever a pattern is recognized the scanner will write to
- stderr a line of the form:
-
- --accepting rule #n
-
- Rules are numbered sequentially with the first one
- being 1.
-
- -f has the same effect as lex's -f flag (do not compress
- the scanner tables); the mnemonic changes from fast
- compilation to (take your pick) full table or fast
- scanner. The actual compilation takes longer, since
- flex is I/O bound writing out the big table.
-
- This option is equivalent to -cf (see below).
-
- -i instructs flex to generate a case-insensitive scanner.
- The case of letters given in the flex input patterns
- will be ignored, and the rules will be matched regard-
- less of case. The matched text given in yytext will
- have the preserved case (i.e., it will not be folded).
-
- -r specifies that the scanner uses the REJECT action.
-
- -s causes the default rule (that unmatched scanner input
- is echoed to stdout) to be suppressed. If the scanner
- encounters input that does not match any of its rules,
- it aborts with an error. This option is useful for
- finding holes in a scanner's rule set.
-
- -v has the same meaning as for lex (print to stderr a sum-
- mary of statistics of the generated scanner). Many
- more statistics are printed, though, and the summary
- spans several lines. Most of the statistics are mean-
- ingless to the casual flex user.
-
-
-
- Printed 12/28/88 13 May 1987 1
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- -F specifies that the fast scanner table representation
- should be used. This representation is about as fast
- as the full table representation (-f), and for some
- sets of patterns will be considerably smaller (and for
- others, larger). In general, if the pattern set con-
- tains both "keywords" and a catch-all, "identifier"
- rule, such as in the set:
-
- "case" return ( TOK_CASE );
- "switch" return ( TOK_SWITCH );
- ...
- "default" return ( TOK_DEFAULT );
- [a-z]+ return ( TOK_ID );
-
- then you're better off using the full table representa-
- tion. If only the "identifier" rule is present and you
- then use a hash table or some such to detect the key-
- words, you're better off using -F.
-
- This option is equivalent to -cF (see below).
-
- -I instructs flex to generate an interactive scanner.
- Normally, scanners generated by flex always look ahead
- one character before deciding that a rule has been
- matched. At the possible cost of some scanning over-
- head (it's not clear that more overhead is involved),
- flex will generate a scanner which only looks ahead
- when needed. Such scanners are called interactive
- because if you want to write a scanner for an interac-
- tive system such as a command shell, you will probably
- want the user's input to be terminated with a newline,
- and without -I the user will have to type a character
- in addition to the newline in order to have the newline
- recognized. This leads to dreadful interactive perfor-
- mance.
-
- If all this seems to confusing, here's the general
- rule: if a human will be typing in input to your
- scanner, use -I, otherwise don't; if you don't care
- about how fast your scanners run and don't want to make
- any assumptions about the input to your scanner, always
- use -I.
-
- Note, -I cannot be used in conjunction with full or
- fast tables, i.e., the -f, -F, -cf, or -cF flags.
-
- -L instructs flex to not generate #line directives (see
- below).
-
- -T makes flex run in trace mode. It will generate a lot
- of messages to standard out concerning the form of the
- input and the resultant non-deterministic and
-
-
-
- Printed 12/28/88 13 May 1987 2
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- deterministic finite automatons. This option is mostly
- for use in maintaining flex.
-
- -c[efmF]
- controls the degree of table compression. -ce directs
- flex to construct equivalence classes, i.e., sets of
- characters which have identical lexical properties (for
- example, if the only appearance of digits in the flex
- input is in the character class "[0-9]" then the digits
- '0', '1', ..., '9' will all be put in the same
- equivalence class). -cf specifies that the full
- scanner tables should be generated - flex should not
- compress the tables by taking advantages of similar
- transition functions for different states. -cF speci-
- fies that the alternate fast scanner representation
- (described above under the -F flag) should be used. -
- cm directs flex to construct meta-equivalence classes,
- which are sets of equivalence classes (or characters,
- if equivalence classes are not being used) that are
- commonly used together. A lone -c specifies that the
- scanner tables should be compressed but neither
- equivalence classes nor meta-equivalence classes should
- be used.
-
- The options -cf or -cF and -cm do not make sense
- together - there is no opportunity for meta-equivalence
- classes if the table is not being compressed. Other-
- wise the options may be freely mixed.
-
- The default setting is -cem which specifies that flex
- should generate equivalence classes and meta-
- equivalence classes. This setting provides the highest
- degree of table compression. You can trade off
- faster-executing scanners at the cost of larger tables
- with the following generally being true:
-
- slowest smallest
- -cem
- -ce
- -cm
- -c
- -c{f,F}e
- -c{f,F}
- fastest largest
-
-
- -Sskeleton_file
- overrides the default skeleton file from which flex
- constructs its scanners. You'll never need this option
- unless you are doing flex maintenance or development.
-
- INCOMPATIBILITIES WITH LEX
-
-
-
- Printed 12/28/88 13 May 1987 3
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- flex is fully compatible with lex with the following excep-
- tions:
-
- - There is no run-time library to link with. You needn't
- specify -ll when linking, and you must supply a main
- program. (Hacker's note: since the lex library con-
- tains a main() which simply calls yylex(), you actually
- can be lazy and not supply your own main program and
- link with -ll.)
-
- - lex's %r (Ratfor scanners) and %t (translation table)
- options are not supported.
-
- - The do-nothing -n flag is not supported.
-
- - When definitions are expanded, flex encloses them in
- parentheses. With lex, the following
-
- NAME [A-Z][A-Z0-9]*
- %%
- foo{NAME}? printf( "Found it\n" );
- %%
-
- will not match the string "foo" because when the macro
- is expanded the rule is equivalent to "foo[A-Z][A-Z0-
- 9]*?" and the precedence is such that the '?' is asso-
- ciated with "[A-Z0-9]*". With flex, the rule will be
- expanded to "foo([A-z][A-Z0-9]*)?" and so the string
- "foo" will match.
-
- - yymore() is not supported.
-
- - The undocumented lex-scanner internal variable yylineno
- is not supported.
-
- - If your input uses REJECT, you must run flex with the
- -r flag. If you leave out the flag, the scanner will
- abort at run-time with a message that the scanner was
- compiled without the flag being specified.
-
- - The input() routine is not redefinable, though may be
- called to read characters following whatever has been
- matched by a rule. If input() encounters and end-of-
- file the normal yywrap() processing is done. A
- ``real'' end-of-file is returned as EOF.
-
- Input can be controlled by redefining the YY_INPUT
- macro. YY_INPUT's calling sequence is
- "YY_INPUT(buf,result,max_size)". Its action is to
- place up to max_size characters in the character buffer
- "buf" and return in the integer variable "result"
- either the number of characters read or the constant
-
-
-
- Printed 12/28/88 13 May 1987 4
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- YY_NULL (0 on Unix systems) systems) to indicate EOF.
- The default YY_INPUT reads from the file-pointer "yyin"
- (which is by default stdin), so if you just want to
- change the input file, you needn't redefine YY_INPUT -
- just point yyin at the input file.
-
- A sample redefinition of YY_INPUT (in the first section
- of the input file):
-
- %{
- #undef YY_INPUT
- #define YY_INPUT(buf,result,max_size) \
- result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
- %}
-
- You also can add in things like counting keeping track
- of the input line number this way; but don't expect
- your scanner to go very fast.
-
- - output() is not supported. Output from the ECHO macro
- is done to the file-pointer "yyout" (default stdout).
-
- - Trailing context is restricted to patterns which have
- either a fixed-sized leading part or a fixed-sized
- trailing part. For example, "a*/b" and "a/b*" are
- okay, but not "a*/b*". This restriction is due to a
- bug in the trailing context algorithm given in Princi-
- ples of Compiler Design (and Compilers - Principles,
- Techniques, and Tools) which can result in mismatches.
- Try the following lex program
-
- %%
- x+/xy printf( "I found \"%s\"\n", yytext );
-
- on the input "xxy". (If anyone knows of a fast algo-
- rithm for finding the beginning of trailing context for
- an arbitrary pair of regular expressions, please let me
- know!) If you must have arbitrary trailing context, you
- can use yyless() to effect it.
-
- - flex reads only one input file, while lex's input is
- made up of the concatenation of its input files.
-
- ENHANCEMENTS
- - Exclusive start-conditions can be declared by using %x
- instead of %s. These start-conditions have the property
- that when they are active, no other rules are active.
- Thus a set of rules governed by the same exclusive
- start condition describe a scanner which is independent
- of any of the other rules in the flex input. This
- feature makes it easy to specify "mini-scanners" which
- scan portions of the input that are syntactically
-
-
-
- Printed 12/28/88 13 May 1987 5
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- different from the rest (e.g., comments).
-
- - flex dynamically resizes its internal tables, so direc-
- tives like "%a 3000" are not needed when specifying
- large scanners.
-
- - The scanning routine generated by flex is declared
- using the macro YY_DECL. By redefining this macro you
- can change the routine's name and its calling sequence.
- For example, you could use:
-
- #undef YY_DECL
- #define YY_DECL float lexscan( a, b ) float a, b;
-
- to give it the name lexscan, returning a float, and
- taking two floats as arguments.
-
- - flex generates #line directives mapping lines in the
- output to their origin in the input file.
-
- - You can put multiple actions on the same line,
- separated with semi-colons. With lex, the following
-
- foo handle_foo(); return 1;
-
- is truncated to
-
- foo handle_foo();
-
- flex does not truncate the action. Actions that are
- not enclosed in braces are terminated at the end of the
- line.
-
- - Actions can be begun with %{ and terminated with %}. In
- this case, flex does not count braces to figure out
- where the action ends - actions are terminated by the
- closing %}. This feature is useful when the enclosed
- action has extraneous braces in it (usually in comments
- or inside inactive #ifdef's) that throw off the brace-
- count.
-
- - All of the scanner actions (e.g., ECHO, yywrap ...)
- except the unput() and input() routines, are written as
- macros, so they can be redefined if necessary without
- requiring a separate library to link to.
-
- FILES
- flex.skel
- skeleton scanner
-
- flex.fastskel
- skeleton scanner for -f and -F
-
-
-
- Printed 12/28/88 13 May 1987 6
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- flexskelcom.h
- common definitions for skeleton files
-
- flexskeldef.h
- definitions for compressed skeleton file
-
- fastskeldef.h
- definitions for -f, -F skeleton file
-
- SEE ALSO
- lex(1)
-
- M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
-
- AUTHOR
- Vern Paxson, with the help of many ideas and much inspira-
- tion from Van Jacobson. Original version by Jef Poskanzer.
- Fast table representation is a partial implementation of a
- design done by Van Jacobson. The implementation was done by
- Kevin Gong and Vern Paxson.
-
- Thanks to the many flex beta-testers, especially Casey Lee-
- dom, Nick Christopher, Chris Faylor, Eric Goldman, Craig
- Leres, Mohamed el Lozy, Esmond Pitt, Jef Poskanzer, and Dave
- Tallman. Thanks to John Gilmore, Bob Mulcahy, Rich Salz,
- and Richard Stallman for help with various distribution
- headaches.
-
- Send comments to:
-
- Vern Paxson
- Real Time Systems
- Bldg. 46A
- Lawrence Berkeley Laboratory
- 1 Cyclotron Rd.
- Berkeley, CA 94720
-
- (415) 486-6411
-
- vern@lbl-{csam,rtsg}.arpa
- ucbvax!lbl-csam.arpa!vern
-
-
- DIAGNOSTICS
- flex scanner jammed - a scanner compiled with -s has encoun-
- tered an input string which wasn't matched by any of its
- rules.
-
- flex input buffer overflowed - a scanner rule matched a
- string long enough to overflow the scanner's internal input
- buffer (as large as BUFSIZ in "/usr/include/stdio.h"). You
- can edit flexskelcom.h and increase YY_BUF_SIZE and
-
-
-
- Printed 12/28/88 13 May 1987 7
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- YY_MAX_LINE to increase this limit.
-
- REJECT used and scanner was not generated using -r - jus
- like it sounds. Your scanner uses REJECT. You must run flex
- on the scanner description using the -r flag.
-
- old-style lex command ignored - the flex input contains a
- lex command (e.g., "%n 1000") which is being ignored.
-
- BUGS
- Use of unput() or input() trashes the current yytext and
- yyleng.
-
- Use of unput() to push back more text than was matched can
- result in the pushed-back text matching a beginning-of-line
- ('^') rule even though it didn't come at the beginning of
- the line.
-
- Nulls are not allowed in flex inputs or in the inputs to
- scanners generated by flex. Their presence generates fatal
- errors.
-
- Do not mix trailing context with the '|' operator used to
- specify that multiple rules use the same action. That is,
- avoid constructs like:
-
- foo/bar |
- bletch |
- bugprone { ... }
-
- They can result in subtle mismatches. This is actually not
- a problem if there is only one rule using trailing context
- and it is the first in the list (so the above example will
- actually work okay). The problem is due to fall-through in
- the action switch statement, causing non-trailing-context
- rules to execute the trailing-context code of their fellow
- rules. This should be fixed, as it's a nasty bug and not
- obvious. The proper fix is for flex to spit out a
- FLEX_TRAILING_CONTEXT_USED #define and then have the backup
- logic in a separate table which is consulted for each rule-
- match, rather than as part of the rule action. The place to
- do the tweaking is in add_accept() - any kind soul want to
- be a hero?
-
- The pattern:
-
- x{3}
-
- is considered to be variable-length for the purposes of
- trailing context, even though it has a clear fixed length.
-
- Due to both buffering of input and read-ahead, you cannot
-
-
-
- Printed 12/28/88 13 May 1987 8
-
-
-
-
-
-
- FLEX(1) UNIX Programmer's Manual FLEX(1)
-
-
-
- intermix calls to, for example, getchar() with flex rules
- and expect it to work. Call input() instead.
-
- The total table entries listed by the -v flag excludes the
- number of table entries needed to determine what rule has
- been matched. The number of entries is equal to the number
- of DFA states if the scanner was not compiled with -r, and
- greater than the number of states if it was.
-
- The scanner run-time speeds have not been optimized as much
- as they deserve. Van Jacobson's work shows that the can go
- quite a bit faster still.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Printed 12/28/88 13 May 1987 9
-
-
-